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We consider Bayesian nonparametric density estimation using a 
Pitman- Yor or a normalized inverse-Gaussian process kernel mixture 
as the prior distribution for a density. The procedure is studied from a 
frequentist perspective. Using the stick-breaking representation of the 
Pitman- Yor process or the expression of the finite-dimensional distri- 
butions for the normalized-inverse Gaussian process, we prove that, 
when the data are replicates from an infinitely smooth density, the 
posterior distribution concentrates on any shrinking L p -norm ball, 
1 < p < oo, around the sampling density at a nearly parametric rate, 
up to a logarithmic factor. The resulting hierarchical Bayesian proce- 
dure, with a fixed prior, is thus shown to be adaptive to the infinite 
degree of smoothness of the sampling density. 
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1. Introduction. Consider the problem of estimating a density on the 
real line from independent and identically distributed (i.i.d.) observations, 
taking a Bayesian nonparametric approach. A prior is defined on a metric 
space of probability measures with Lebesgue density and a summary of the 
posterior, typically the posterior expected density, can be employed as an 
estimator. Since the seminal articles of Ferguson [6] and Lo [22], the idea 
of constructing priors on spaces of densities by convolving a fixed kernel 
with a random distribution has been successfully exploited in density esti- 
mation: a kernel mixture may provide an efficient approximation scheme, 
possibly resulting in a minimax-optimal (up to a logarithmic factor) speed 
of concentration for the posterior on shrinking balls around the sampling 
density. 

Recent literature on Bayesian kernel density estimation has mainly fo- 
cussed on posterior contraction rates relative to the Hellinger or the L 1 - 
metric, using a Dirichlet process mixture of normals (or generalized nor- 
mals). Ghosal and van der Vaart [10] found a nearly parametric rate for 
estimating infinitely smooth densities that are in the model, i.e., are them- 
selves normal mixtures, while Shen and Ghosal [31], extending the result of 
Kruijer et al. [20] on finite Dirichlet mixtures, have proved that fully rate- 
adaptive multivariate density estimation of ordinary smooth densities over 
Holder regularity scales can be performed using infinite Dirichlet mixtures 
of Gaussians, without bandwidth shrinkage in the prior for the scale. 

Even if much progress has been done during the last decade in under- 
standing frequentist asymptotic properties of kernel mixture models for 
Bayesian density estimation, there seems to be a lack of results concern- 
ing adaptive estimation of infinitely smooth densities with respect to more 
general loss-functions than the Hellinger metric, using other processes, apart 
from the Dirichlet process, as priors for the mixing. In this article, we in- 
vestigate the question of how to complement and generalize existing re- 
sults on posterior contraction rates by considering adaptive estimation of 
infinitely smooth densities using, e.g., either the Pitman- Yor or the normal- 
ized inverse-Gaussian process as priors for the mixing distribution of general 
kernel mixtures. 

We prove that for densities with Fourier transform satisfying an exponen- 
tial moment condition, virtually implying that the characteristic function 
decreases at worst at an exponential power rate, an almost parametric rate 
of posterior contraction arises in all L p -norms, 1 < p < oo, under different 
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priors (possibly) affecting only the power of the logarithm term which, ad- 
mittedly, may not be optimal. In fact, it is known that the minimax rate for 
estimating an entire density / such that sup,,, \ f(x + iy)\ < M exp {c\y\ p } , 
with p > 1, is n~ l ' 2 {\.ogn)^ p ~ i ''^ 2p ' in all L p -norm, 2 < p < oo, see, e.g., 
Theorem 4.1 in Ibragimov [16], page 366, and the reference therein. Such 
a fast rate is roughly explainable from the fact that the spaces of analytic 
functions are only slightly bigger than the finite-dimensional spaces in terms 
of metric entropy. 

Such results are of interest for a variety of reasons: they may constitute 
a first step, beyond the Dirichlet process, towards the study of posterior 
contraction rates for more involved process priors recently proposed in the 
literature. Also, they provide an indication on the performance of Bayes' pro- 
cedures for adaptive estimation of densities belonging to a class, extensively 
considered in the frequentist literature for nonparametric curve estimation, 
which provides an alternative to classes of densities that are only finitely 
smooth. 

The main challenge in extending the adaptation result from the ordinary 
to the infinitely smooth case rests in finding a finite mixing distribution, 
with a sufficiently restricted number of support points, such that the corre- 
sponding Gaussian mixture approximates the sampling density, in Kullback- 
Leibler divergence, with an exponentially small error in terms of the inverse 
of the bandwidth. Such a finitely supported mixing distribution may be 
found by matching the moments of an ad hoc constructed mixing density, 
for which, however, the twicing kernel method used by Kruijer et al. [20] 
does not seem to be well-suited because of the infinite degree of smoothness 
of the true density. There seems to be limitations implicitly coming from the 
kernel which are by-passed using superkernels, e.g., the sine kernel, whose 
usefulness in density estimation has been pointed out by, among others, 
Devroye [5]. The crux and a main contribution of this article is the devel- 
opment of an approximation result for analytic densities with exponentially 
decaying Fourier transforms, cf. Lemma 8.1. We believe this result can be of 
autonomous interest as well and possibly exploited by frequentist methods 
in adaptive density estimation for clustering with Gaussian mixtures, along 
the lines of Maugis and Michel [23] . 

When assessing posterior rates, a major difficulty is the evaluation of the 
prior concentration rate, calculated bounding below the prior probability 
of Kullback-Leibler type neighbourhoods by the prior probability of an L 1 - 
ball of the right dimension. For the normalized inverse-Gaussian process, 
the expression of the finite-dimensional distributions is used to estimate the 
probability of an L 1 -ball as done in the literature for the Dirichlet process. 
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For the Pitman- Yor process, instead, we exploit the stick-breaking represen- 
tation to obtain lower bounds on the probabilities of L 1 -balls of the mixing 
weights and the locations. We expect that this technique can be applied to 
other stick-breaking processes in future papers. 

For the sake of clarity, the exposition is focussed on density estimation, 
but other statistical settings are implicitly covered: for example, fixed design 
linear regression with unknown error distribution, as described in Ghosal and 
van der Vaart [11], pages 205-206. Extension of these results to a multivari- 
ate setting seems imminent along the lines of Shen and Ghosal [31] and is 
not pursued here. 

The organization of the article is as follows. In Section 2, we fix the no- 
tation and review preliminary definitions. In Section 3, we state a result 
on posterior rates for general kernel mixtures, highlighting the connection 
with rates of contraction of the posterior for the mixing distribution (proofs 
are postponed to Section 5). Main results are reported in Section 4, where 
after investigating the achievability of the error rate 1/y/n, up to a loga- 
rithmic factor, for super-smooth densities that are in the model, which helps 
developing mathematical tools, we focus on adaptive estimation of analytic 
densities using infinite Gaussian mixtures. Prior estimates are given in Sec- 
tion 6. Section 7 and Section 8 report the proofs of the main theorems. 
Auxiliary results are collected in the Appendix. 

2. Notation and preliminaries. In this section, we introduce some 
notation and review definitions used throughout the article. The model is 
assumed to be a location mixture, 

f F , a {x) := (F*K a )(x) = [ o-- 1 K((x-8)/a)dF(8), xGR, 

Jr 

where K denotes the kernel density, a the scale parameter and F the mix- 
ing distribution. Kernels herein considered are characterized via a condition 
involving the Fourier transform. Such a condition is hereafter stated for 
a generic probability density function. Let f(t) := L e ttx f(x) dx, t £ M, 
be the Fourier transform or characteristic function of /. We say that / is 
super-smooth if its characteristic function satisfies the following condition: 
for constants p, r > and < L < oo, 

(2.1) I P Af)-= I \Kt)\ 2 e W {2(p\t\) r }dt<2nL. 

JR 

Condition (2.1) implies that the behaviour of |/| is virtually described by 
exp { — (/o|i|) r } as |i| — > oo. Densities with Fourier transforms satisfying re- 
quirement (2.1) are infinitely differentiable on 1R, see, e.g., Theorem 11.6.2. 
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in Kawata [19], pages 438-439. Also, they are bounded. Set C(p, r) := 
f °° exp {-2(ptY} dt = (2pT 1/r r(l + 1/r), 

(2.2) ||7 IU < (27T)- 1 / \f(t)\ dt < L + tt-^Cp, r) < oo, 

cf. Lemma 1 in Butucea and Tsybakov [3], page 35. Super-smooth densities 
constitute a somewhat larger class than that of analytic densities, including 
important examples like Gaussian, Cauchy, general symmetric stable laws, 
Student 's-f, distributions with characteristic functions vanishing outside a 
compact, as well as their mixtures and convolutions. 

Example 2.1. Symmetric stable laws, which have characteristic func- 
tions of the form e~^ t ^ r , t G R, for some p > and < r < 2, are 
super-smooth. Here, r is called the index of the stable law, except in the 
degenerate case p = where the Fourier transform is identically equal to 
1 and the law is a point mass at 0. We rule out this case. Cauchy laws 
Cauchy(0, a) are stable with r = 1 and p = a. Normal laws N(0, a 2 ) are 
stable with r = 2 and p = ojypl. 

Example 2.2. Student 's-t distribution with v > degrees of freedom 
has characteristic function verifying (2.1) for r = 1: 

(2.3) f t (t) ^ - - ^,J ^\t\)^ 1)/2 e~^ as \t\ -> oo, 

see formula (4.8) in Hurst [15], page 5. 

Example 2.3. Exponential Power Distributions (EPD's) with shape 
parameter p that is an even integer have characteristic functions satis- 
fying (2.1). A random variable X has an EPD with location parameter 
6 = E[X], shape parameter (or exponent) p and scale parameter a = a p = 
{E[|X - e\P]} l /P, in symbols, X ~ EPD(0, a, p), with 9 G R and a, p > 0, if 
it has density fe,*, P (x) = [2(jp 1 /?T(l + 1/p)]^ 1 exp{-(|x - 6\/a) p /p}, x G R. 
It is known from Pogany and Nadarajah [29], page 205, that, for p > 1, 

(2-4) A--p(*)-f(lM^r((2fc + l)/2) X fc! ' tGM ' 



where the series converges for p > 1. Proposition A.l asserts that, when p = 
2m, m G N, fe,a,2m(t) ^ e _c * , i G R, thus, the corresponding distribution 
function is analytic. 
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Example 2.4. Densities with characteristic functions vanishing outside 
a symmetric convex compact set are super-smooth. Let A be a symmet- 
ric convex compact set in k > 1. Denote by £a the class of densities 
with characteristic functions equal to outside A. The set Ea is essentially 



see Hasminskii and Ibragimov [14], page 1008, and the references therein. 
Let A = [-T, T], for < T < oo. For any / £ S A , it is I p , r (f) < 2vrL for 
every /?, r > and L > 7r _1 Texp {2(pT) r }. The Fejer-de la Vallee-Poussin 
density f(x) = (2vr)- 1 [(x/2)^ 1 sin(x/2)] 2 , having f(t) = (1 - \t\)+ , 

t £ R, is the typical example of density in Sa, for A = [—1, 1]. 

Given the model f F ,a = F * K a , a prior is constructed on the space 
of Lebesgue univariate densities by putting priors on the scale and the 
mixing distribution. The scale parameter is assumed to be distributed, in- 
dependently of F, according to a distribution G on (0, oo). Let LT <g) G 
denote the overall prior on ^#(R) x (0, oo), where ^#(R) stands for the 
set of probability measures on R. Then, LT (g> G induces a prior on & := 
{/f,<t : (F, cr) 6 ^#(R) x (0, oo)} via the mapping (F, <r) H> We as- 

sume that & is equipped with a metric d, either the Hellinger du(f, g) := 
(J(/ 1/2 - 9 1/2 ) 2 dA) 1 / 2 , where A denotes Lebes gue measure on R, or the one 
induced by the L p -norm, ||/ — g\\ p := (/ \f — g\ p dA) 1 ^, 1 < p < oo, the sup- 
norm being defined as ||/ — g||oo := sup^gjg \ f(x) — g{x)\. We study rates of 
contraction for the posterior of Ti® G, assuming that := (X\, . . . , X n ) 
are i.i.d. observations from /o, which may or may not be itself a kernel mix- 
ture. A sequence 5 n — > 0, as n — > oo, is said to be an upper bound on the 
posterior rate of convergence relative to a metric d if, for some constant 



< M < oo, (II ® G)({f Fta : d(f Fia , f ) > M5 n }\XV>) -> 0, P °°-almost 



surely or in P^-probability, where Pq stands for the true probability mea- 
sure. In the next section, we provide a result on posterior rates of contraction 
for mixture models with super-smooth kernels. 

3. Posterior rates of contraction for kernel mixtures. We derive 
a theorem for assessing rates of posterior contraction for kernel mixtures in 
terms of the prior concentration rate only. The assertion yields minimax- 
optimal rates, up to a log- factor, for every L p -norm, 2 < p < oo, when 
the prior concentration rate is nearly parametric. Moreover, it relaxes the 



infinite-dimensional, nevertheless, for p > 2, infy n supj gSA Ey[||/„ — /||p] < 
c s n~ s l 2 . Moreover, for p = s = 2, the precise asymptotic bound holds: 
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condition on the prior for the scale to a condition involving only the lower 
tail by requiring an exponential decay at zero. 

(P) The prior distribution G for a satisfies the tail condition G{a) < 
e~ dcr 7 as o — y 0, for some constants d > and 1 < 7 < 00. 

The assertion of the theorem holds for any density that is well approximated 
by its convolution with the sine kernel 



sinc(x) := < 



( SIM 

, if if 0, 

irx 

-, if x = 0. 

TT 



This is an unconventional kernel, in the sense that it may take negative 
values. It is Riemann integrable, J sine dA = 1, but not Lebesgue integrable, 
sine ^ L 1 (M). Also, sinc(i) = i](i), t £ M. To state the theorem, let 
BkM; e 2 n ) := {f F , a : E[log(/ Fj(T //o)] < e 2 n , E[(log(/ F , CT // )) 2 ] < e 2 J. 

Theorem 3.1. Let K be a probability density with characteristic func- 
tion satisfying (2.1) for some constants p, r > and < L < oo. Let 
e n be a sequence such that e n — > and nel — >■ oo as n — > oo. For every 
2 < p < oo, define 5 n := e n (nel)^ l ~ l ' p ) I 2 . Suppose that, for f G stic/i 
that ||/o * sinc 2 -j„ — fo\\ p = 0(5 n ), with 2 Jn = 0(nel), 

(3.1) (II(g)G)(i?KL(/o; ^n)) ^ exp{— Cne 2 } /or some constant C > 0, 

where U is a prior for F and G a prior for a satisfying assumption (P), 
with 7 > 1 such that nel ^ (log n) 1 /^ 1 " 1 ' 7 )1 . if 5 n — > as n — > oo, i/ien 
there exists a constant < M < oo suc/i i/iai 

(n (g) : \\f F , a - fo\\ p > M5 n }\X^) m Ptf -probability. 

When the employed kernel has characteristic function decreasing at an ex- 
ponential power rate and /o is a kernel mixture with compactly supported 
mixing distribution, the preceding theorem yields rates of contraction, rel- 
ative to the Wasserstein distance of order 1, for the posterior of the mix- 
ing distribution. Let (0, d), C M, be a measurable metric space with 
the Borel o~-field. For p > 1, define the Wasserstein distance of order p 
between any two Borel probability measures ji and v on with finite pih- 
moment (i.e., J e d p (x, xo)dfi(x) < oo for some (and hence any) xq in 0) as 
W p (fJ,, u) := (inf 7er(M)l/) J exe d p (x, y)dj(x, y)) l/p , where 7 runs over the 
set of all joint probability measures on x with marginal distributions 
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\i and v. When p = 2, we take d to be the Euclidean distance on 0. Let 
diam(0) := sup{o!(x, y) : x, y £ 6} be the diameter of 0. From the defini- 
tion, W p (fi, v) G [0, diam(0)]. If is compact, then diam(0) < oo. 

Corollary 3.1. Let K be a symmetric density around 0, with charac- 
teristic function satisfying (2.1) for some constants p, r > and < L < oo 
and, furthermore, 
(3.2) 

for some constants B, P > 0, \K(t)\ > B exp {-{/3\t\) r }, t G R. 

Let fo = fF ,i, with Fq supported on a compact set 8cl. Let H be a prior 
on „#(0). If condition (3.1) is verified for a sequence e n such that e n — > 
and ne^ — > oo as n — >■ oo, i/ien, /or a sufficiently large constant < M < oo, 

n({F : VFi(F, F ) < M(logn)^ 1/r }|X (n) ) -»• 1 in P n -probability. 

4. Posterior rates of contraction for specific priors on the mix- 
ing. In this section, we derive rates of contraction for specific priors on the 
mixing distribution, i.e., the Pitman- Yor process, which renders the Dirich- 
let process as a special case, and the normalized inverse-Gaussian process. 
Nearly parametric rates arise either when the sampling density is within the 
model or when it is a generic analytic density. 

4.1. Estimation of densities with kernel mixture representation. We be- 
gin the analysis from the case where /o is itself a kernel mixture, 

fo (x) = fF , a (a?) = (F * K ao )(x), x£R, 

where Fo and o"o denote the true values of the mixing distribution and the 
scale parameter, respectively. Results are obtained under the following as- 
sumptions. 

(Ao) The kernel density K : R — > R + is symmetric around 0, monotone 
decreasing in \x\ and satisfies the tail condition K(x) > e~ c ^ K for 
large \x\, for some constants c > and < K < oo. 

(Ai) The true mixing distribution Fo satisfies the tail condition 

(4.1) F {{6 : \6\ > £}) < e- C0 * ro for large t > 0, 

for some constants cq > and < w < oo. 

(A2) The base measure a has a continuous and positive Lebesgue density 
a' such that, for some constants b > and < 5 < 00, satisfies the 
tail condition 

(4.2) a'(0) oc e- 61 " 1 * for large \6\. 
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(A3) The prior distribution G for a has a continuous and positive Lebesgue 
density on an interval containing do and, for constants d > 0, 1 < 
7 < 00 and < q < 00, satisfies G{a) < e~ dcr as a — >■ and 
1 — < a~ 8 as <r — y 00. 

The case where = 00 in (Ai) corresponds to a compactly supported mixing 
distribution. The same holds for the base measure a when 5 = 00 in (A2). 
As far as assumption (A3) is concerned, an inverse-gamma distribution on 
a 2 is an eligible prior, in fact, for a suitable d > 0, we have G(a) < e~ dcr . 

Stick-breaking processes and the Pitman-Yor process. We consider the 
class of stick-breaking processes which includes, as important special cases, 
the Dirichlet process, the two-parameter Poisson-Dirichlet process or Pitman- 
Yor process, see Pitman and Yor [27], the beta two-parameter process, see 
Ishwaran and Zarepour [17], Ishwaran and James [18], and the geometric 
stick-breaking process, see Mena et al. [24]. The trajectories of a stick- 
breaking process F can be (almost surely) represented as F = YlJLi Wjdzj, 
where 5zj denotes a point mass at Zj. The random variables Zj, j £ N, are 
i.i.d. a, where a is a non-atomic (i.e., a({z}) = for every z € K) prob- 
ability measure over (R, 5(E)) defined as a := a/a(R), a being a positive 
and finite measure. The random variables Wj, j £ N, are independent of the 
Zj's and such that < Wj < 1, with YlJLi Wj =' 1. Also, 

j'-i 

(4.3) W 1 = V 1 , Wj = VjH(l-V h ), j>2, 

h=l 

with Vj\Hj m ~ p Hj, where Hj is a probability measure on [0, 1]. A necessary 

and sufficient condition for YlJLi Wj = 1 to hold is that Y^jLi 1°S(1 — 
Fjjjj [Vj]) = —00, see, e.g., Lemma 1 in Ishwaran and James [18], pages 162 
and 170. Consider a stick-breaking process where, for < d < 1 and c > —d, 

Vj m ~ p Beta(l — d, c+dj), j G N. The parameter c is called the concentration 
parameter and d the discount parameter. The resulting process is called 
the two-parameter Poisson-Dirichlet process or Pitman-Yor process with 
parameters c, d and base measure a, denoted F ~ PY(c, d, a): 



F ~ £ 



VjIIC 1 -^) 



in ~ p Beta(l -d,c+ dj) 

ry lid 

Zj ~ a. 
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The case where d = and c = a(R) returns the Dirichlet process D(a). 
Note that, in both the Dirichlet process and the Pitman- Yor process, the 
weights {Vj nl=i(l ~~ Vh)}j>i ar e the weights of the process in size-biased 
order. When c = 0, the Pitman- Yor process reduces to a stable process. 
When c = and d = 1/2, the stable process is a normalized inverse-gamma 
process. There are no known analytic expressions for its finite-dimensional 
distributions, except for the cases where d = and d = 1/2. Indeed, the 
Dirichlet process, the Pitman- Yor process, with d = 1/2, and the normal- 
ized inverse- Gaussian process are the only known priors for which explicit 
expressions of the finite-dimensional distributions are available. 

In order to state the main result of the section, for k, r > 0, let vo be such 
that 

(4.4) max{rc, [1 + I {1>oo) (r)/(r - 1)}} < w < oo 
and let r be defined as 

(4.5) r := 1 + [l/r - (l - I (0iOo) M/^)] / (0j i](r)/2. 

Theorem 4.1. Let K be as in assumption (Ao), with characteristic func- 
tion satisfying (2.1) for some constants p, r > and < L < oo. Let 
fo = F * K ao , with 

(i) Fq satisfying (Ai) for some constants cq > and w as in (4.4). 

Let the prior for F be a PY(c, d, a), with c > —d and < d < 1. Assume 
that 

(ii) a satisfies (A2) for some constants b > and < 5 < 00, with 5 < w 
whenever w < 00, 

(ra) G satisfies (A 3 ) for-/ > 1 andj > {l-{2r[r+(r-l/2)/ {0i oo) (d)]}- 1 }" 1 , 
with t as in (4.5), 

then the posterior rate of convergence relative to the LP -norm, 1 < p < 00, 
is n -1//2 (log n)^ for a suitable [i > 0. 

Normalized inverse- Gaussian process. We start by recalling the defini- 
tion of the normalized inverse-Gaussian (N-IG) distribution. The random 
vector (Zi, . . . , Zn), N > 2, is said to have a N-IG distribution with pa- 
rameters («i, . . . , oat), ctj > for every j = 1, . . . , N and otj > for at 
least one j, denoted N-IG(ai, . . . , oyv), if it has probability density function 
over the unit (A r — l)-simplex A w_1 
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X (An(zi, ZN-l)) 



X [z\ X . . . X ZN-l X (1 — Z\ 



ZN-lT 3/2 



1 



(4.6) 




r=l 




Consider a space X with a u-algebra A of subsets of X. Let a be a fi- 
nite and positive measure on (X, A). Following Lijoi et al. [21], a random 
probability measure F is called a normalized inverse- Gaussian process on 
(X, A), with parameter a, denoted N-IG (a), if, for every finite measurable 
partition Ai, . . . , An of X, the probability vector (F(Ai), . . . , F(An)) has 
a N-IG distribution with parameters {a{A\), . . . , a(Aw)). The N-IG process 
has wider support around its modes than the Dirichlet process. 

Theorem 4.2. Assume the set-up and conditions of Theorem 4-1 except 
that the prior for F is a N-IG(q), where a is a finite and positive measure 
on R. Then, the posterior rate of convergence relative to the LP -norm, 1 < 
p < oo, is n" 1 / 2 (log re)' 1 for a suitable \i > 0. 

The proof is the same as for the Dirichlet process and is omitted. 

4.2. Adaptive estimation of analytic densities. In this section, we study 
adaptive estimation of analytic densities using infinite Gaussian mixtures. 
We assume that /o satisfies the following conditions, where we denote by 
C" J (R) the class of analytic functions on R. 

(a) Smoothness: /o G C W (]R) is a probability density with characteristic 
function satisfying (2.1) for some constants po > 0, < ro < 2 and 
< Lq < oo. Furthermore, log/0 is locally Lipschitz continuous: 

35 > s.t. Vx, y : \y-x\ < 5, \ log/ (y)-log/ (x)| < Q(x)\y-x\, 

where Q is a non- negative polynomial function of degree at most (q— 1), 
q > 2, such that E [exp {[QpQ]^- 1 )}] < 00; 
(6) Monotonicity: /o is a strictly positive and bounded density, non-decreasing 
on (—00, a), non-increasing on (b, 00) and such that fo(x) > 1$ > 
on [a, 6]; 

(c) Tai/s: there exists a constant Mq > such that fo(x) < Mo(f>(x) for 
all i£R. 
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Conditions (a)-(c) are satisfied, for example, by EPD's with p that is an 
even integer. Even if the greatest value of ro verifying condition (2.1) may 
be greater or equal than 2, we can always take ro to be in (0, 2). This will 
only cause a loss in the logarithmic term of the rate, not affecting the power 
of n. 

Theorem 4.3. Suppose fo is a probability density satisfying (a) -(c) for 
q > 2, < r < q/(3q - 2) and r (q - l)/q < (3 < 1 - r /2. Let the 
model be fF )CT = F * cfi a , with F ~ PY(c, d, a) for c > — d and < d < 1. 
Alternatively, let F ~ N-IG(a). Assume that 

(i) a satisfies (A2) for constants b > and < 5 < 2, 
(ii) G satisfies (A3) for some 7 = 2 and < g < 00. 

The posterior rate of convergence relative to the LP -norm, 1 < p < 00, is 
e n = 77~ 1//2 (logn) M for a suitable /i > 0. 

5. Proofs of Theorem 3.1 and of Corollary 3.1. The following 
lemma provides an upper bound on the L p -approximation error of a den- 
sity, whose Fourier transform either vanishes outside a compact or decays 
exponentially fast, by its convolution with the sine kernel. In order to state 
it, we define, for any probability density /, the positive (possibly infinite) 
constant 5/ := sup{ | £| : |/(t)| 7^ 0}. If 

• Sf < 00, then supp(/) C (—5/, Sf), 

• Sf = 00, then f(t) > for every igl. 

Next, we recall some basic facts on the Fourier transform. If J K \f(t) \ dt < 
00, then / can be recovered from / using the inversion formula f(x) = 
(27r) _1 L e~ ltx f(t) dt, x € R. Furthermore, / is continuous and bounded, 

H/lloo < (27T)- 1 / R |/(i)|di < OO. 

Lemma 5.1. Let f be a probability density on M. with characteristic func- 
tion satisfying (2.1) for some constants p, r > and < L < 00. For any 
fixed a > 0, 

• if Sf < l/cr, then \\f * sinc CT — f\\ p = 0, 1 < p < 00, 

• if S f = 00, then ||/*sin C(T -/||p < e-(^ a ) r / 2 , 2 < p < 00. 

Proof. By the inversion formula for characteristic functions, 

(/ * smc a )(x) - f(x) = -!- / e~ Ux f(t) dt, x€R. 

2vr J\t\>i/a 
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If Sf < l/a, then L t \>i/<j e~ ltx f{t) dt = identically and ||/*smc CT -f\\ p = 
for every 1 < p < oo. Next, suppose Sf = oo. For any function g G L p , with 
2 < p < oo, we have ||g||p < C p ||<?||q, where q~ l := (1— p" 1 ) G [1/2, 1) and C p 
is a constant depending only on p, see, e.g., Theorem 74 in Titchmarsh [32], 
page 96. Hence, ||/*sinc CT -f\\ p p < C p \\f(&^ - l)\\ q q = C p J] t|>1/ff |/(<)|«d*. 
By the Cauchy-Schwarz inequality and the assumption that / satisfies (2.1), 

(5.1) / |/(t)|*d* < 

J\t\>l/cr 

< 
< 

where Jy a exp{-2(pt) r } dt = r~ l {2p r )- 1 l r T(r~ l , 2(p/a) r ), for T(o, z) = 

J^t a - l e- 1 dt, with a, z > 0, the upper incomplete gamma function. It is 
known that T(a, z) ~ z a ~ 1 e~ z as z -> oo. The case where p = oo is treated 
implicitly in (5.1). □ 

The result can be extended to all L p - metrics, 1 < p < oo, replacing the 
sine kernel with a superkernel, which is absolutely integrable. By definition, 
a superkernel L is a symmetric, absolutely integrable function j \ L\ dX < 
oo, with J LdX = 1, having an absolutely integrable Fourier transform L 
(hence L is bounded) with the properties that L = 1 identically on [—1, 1] 
and \L\ < 1 off [—1, 1]. The interval [—1, 1] is chosen for convenience only, 
any neighborhood of the origin is fine. Superkernels necessarily have infinite 
support. 

Lemma 5.2. Let f be a probability density on M. with characteristic func- 
tion satisfying (2.1) for some constants p, r > and < Lf < oo. Let 
v G (0, 1] be such that j f v d\< oo. For any fixed a > 0, ||/ * L a — < 
e -(pM r /2l {oo} (S f ) for every 1 < p < oo. 

PROOF. We have (/ * L a - f){x) = (2k)~ 1 Jj f|>1/ff e~ itx f(t)[L(t) - 1] dt, 
x G R. If Sf < 1/er, then the integral is identically equal to zero and ||/ * 
L a — f\\p = for every 1 < p < oo. If Sf = oo, for every 2 < p < oo, repeat 
the same reasoning as for the sine kernel to conclude that \\f * L a — f\\ p < 
C p \\f(r a - = C p J {tl>1/a (\f(t)\\L(t) - l\)idt < C p J {tl>1/a \f(t)\"dt < 
e -(p/ <J ) r / 2 because |L| < 1. Now, we consider the remaining cases where 
1 < p < 2. From Lemma 1 in Devroye [5], page 2040, and condition (2.1), 



/ 

J t 



\t\>l/a 



\f(t)\dt 



1/2 



exp{-2(p\t\) r }dt 



a 



\t\>l/a 

-(l-r)/2-(p/ay < p -(pM r /2 
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11/ * L a - /||! < 2(J r dA)(7r- 1 J w>1/a \f(t)\ dt) 1 - < For every 

other L p -metric, with 1 < p < 2, use the interpolation inequality ||/ * L a — 
f\\ p < max{||/ * Lfj - /Hx, ||/ * L a - f\\ 2 } to conclude that ||/ * L a - f\\ p < 

e -Q>/a) r /2 m □ 



Before proving Theorem 3.1, a remark is in order. If 
(5.2) / \K(at)\dt < oo, 

then f R \f^a(t)\dt = (27r) _1 / r e - itx \F{t)K(at)\ dt < oo. If K is super- 
smooth, i.e., I Pt r(K) < 2irL for some p, r > and < L < oo, then, not 
only is requirement (5.2) met, cf. (2.2), but fp, a itself is super-smooth with 
Ipcr,r{fF,o) < 2irL/a. Condition (5.2), involving only the kernel, allows to 
recover any convolution fp a by just inverting its Fourier transform. 

PROOF of Theorem 3.1. We appeal to Theorem 2 of Gine and Nickl [13], 
page 2891. Choosing j n = 1 for all n € IN, we have 5 n = e n (ne 2 )( 1_1 / p )/ 2 . For 
s n = E^ne^) -1 / 7 , E > being a suitable constant, let & n := {fp ) a : F G 
^#(]R), o- > s n }. For every /p )(T e we have I Pn , r {fF,a) < 2ixL n , with 
p n := /os n and L n := L/s n . Condition 1(a), ibidem, page 2890, for the convo- 
lution kernel case is verified for the sine kernel. In fact, sine £ L°°(M) since 
|| sine | |oo = I /it < oo. Also, sine G L 2 (]R) because J R sinc 2 (x)dx = 1/tt < oo. 
Besides, the sine kernel is continuous and, as shown in Lemma A. 4, is of 
bounded quadratic variation. Let sincj(/) := /*sinc 2 -^. By Lemma 5.1, for 
every density fp tCT E & n for which Sf F a < oo, whatever sequence J n — > oo, 
for n large enough so that 2 Jn > Sf F ct , we have || sincj n (/f,<t) — fF,a\\p = 
for every 1 < p < oo. For every density fp <(T £ & n for which Sf F = oo, tak- 
ing J n such that 2 Jn = cne^, with c > 2 1 / 7 * /(pE), and using the constraint 
on 7, we have || sincj n CT ) - fF,a\\ P < exp {-(ps n 2 J ") r /2} < n" 1 < <5 n . 
Consequently, for n large enough, ^ n C {/ F (T : || sincj n (/ F)(T ) - /f, ct || p < 
C(K)6 n }, where C(K) > is an appropriate constant depending only on 
the operator (sine) kernel. For E < |(C + 4)/d] -1 / 7 , where C > is the con- 
stant arising in the small ball probability estimate, we have (II <S> G)(<5^) < 
e~ dSn '' < exp { — (C + 4)ne, 2 l } and Assumption (1), ibidem, page 2891, is 
fulfilled. □ 



Proof of Corollary 3.1. Under the stated conditions, Theorem 3.1 
holds for G being a point mass at 1 and /o = /f ,i- Thus, for every 
2 < p < oo, there exists a sufficiently large constant < M' < oo so 
that U({F : ||/ Fil - / || p < M'5 n }\X^) -> 1 in ^-probability, where 
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S n = e n (n4) (1 ~ 1/p)/2 - Since the kernel K satisfies (3.2), by Theorem 2 of 
Nguyen [25], page 8, for any F such that H/jri — fo\\p < M'S n , 

W X (F, F ) < W 2 (F, F ) < (-log||/ F) i - /o||i)^ 1/r < (logn) _1/r , 

where the second inequality descends from Lemma A. 5 applied to l — 
/o||i- In fact, for u > such that E^-[|X| M ] < oo, the absolute moment of 
order u of X under fp t \ is finite for every F G <M{&): in fact, Ef F 1 [\X\ U ] < 
(1 V 2 U - 1 ){E K [\X\ U ] + J Q \9\ u dF(6)} < oo, the integral being finite because 
F is compactly supported on 0. Hence, for a suitable constant < M < oo, 
the inclusion {F : ||/f,i-/o|| p < M'8 n } C {F : Wi(F, F ) < M(log n)" 1 ^} 
holds and the assertion follows. □ 

6. Prior estimates. As mentioned in the introduction, estimates of the 
probability of an L 1 -ball, under different priors, are essential to evaluate the 
prior mass of Kullback-Leibler type balls as in condition (3.1). While for the 
NI-G process, the expression of the finite-dimensional distributions can be 
used to estimate the probability of an L 1 -ball along the lines of Lemma A.l 
in Ghosal et al. [9], pages 518-519, which deals with the Dirichlet process, 
for the Pitman- Yor process, we exploit the stick-breaking representation to 
obtain lower bounds on the probabilities of Z^-balls of the mixing weights 
and the locations, as given in the following two lemmas. 

6.1. Pitman- Yor process. 

Lemma 6.1. Let F ~ PY(c, d, a), with c > —d and < d < 1. Let 
F 1 = Sj=i Pj'^Zj > N >1, be a finite probability measure onM., withpi > p2 > 
... > p N > 0. Define v\ := pi and vj := Pj[l\h=\( l ~ Vh)}' 1 , 2 < j < N. 
For0<e<l, let U := (£? =1 ELi \ V h ~ v h \ < 2e, mim<,-<jv Vj > e/N 2 ), 
where the random variables V\, . . . , Vn are those arising from the stick- 
breaking representation (4.3). There exist constants ci, C > (depending 
only on c and d) such that, for (2e/N 2 ) < (1 — p\)/2, 

P{U) > Cexp{-ciA^max{log(Y/e), dYlog(l/(l - pi))}}. 

PROOF. If \Vj-Vj\ < 2e/N 2 for every j = 1, . . . , N, then ^f =1 Yj h=1 \V h - 
v h \ < 2e. Thus, U is implied by the event V := (\Vj - Vj\ < 2e/N 2 , Vj > 
£ /N 2 , j = 1, N). Let lj := {{v j - 2e/N 2 ) V {e/N 2 )) and Uj := ({ Vj + 

2e/N 2 ) A 1) for j = 1, ... , N. By assumption, Vj m ~ P Beta(l - d, c + dj) 
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for every j £ N, thus, by the identity T(z + 1) = zT(z), z > 0, 

J— 1 J 

N 

-N 



[r(i - d)]- 



>«^nf (1 _ 9rdR 

r( c + div) 1 = 14 v ; 

If TV -> oo ase -> 0, using the formula T(c+dN) ~ (27r) 1 / 2 e - ,i7V ((iA0 cW+c ~ 1/2 , 
letting i; max := maxi<j<jy Vj, 

> tr(i-d)]-"r( c )c"( g /jv 2 )" 
1 ; ~ r( c + dJV) 

X [1 - (( W + 26/iV 2 ) A l)](-l)™(iV + l)/2 

[r(i-d)]-"r( c )c"( e /jv 2 )" _ y2l(c - 1)M(w+1)/2 



> 



exp{-ciiVmax{log(iV/£), diV log(l/(l - <w))}}, 



provided (2s /N 2 ) < (1 — v max )/2, where v max € (0, 1) because of the posi- 
tivity assumption on the mixing weights. The assertion follows noting that 
v max = v 1 =pi. □ 

Remark 6.1. Because of the positivity constraint on d, Lemma 6.1 
does not cover the case of a Dirichlet process, which can be treated using 
Lemma 6.1 in Ghosal et al. [9], pages 518-519, or Lemma A.l in Ghosal [7], 
pages 1278-1279. Letting d -> 0, if N = 0((l/e)*), for f > 0, we have 
~P(U) > exp {— ci-/Vlog(l/e)}, which agrees with the prior estimate known 
for a Dirichlet process. 

Lemma 6.2. Let F ~ PY(c, d, a), with c > -d, < d < 1 and the (un- 
normalized) base measure a = a(M)a satisfying (A2) for some constants 
b > and < S < 00. For < e < 1, let F' = T,f =1 PjS Zj , N > 1, be a 
finite probability measure with supp(F') C [—a, a], for a — > 00. Then, 

P [Y^Zj-z^ <e\ >exp{-N[log(a(R)/(2e))+ba 5 ]}. 
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Proof. If \Zj - Zj\ < e for every j = 1, . . . , N, then Ylf=i \ z j ~ z j\ ^ £ - 
Because Z\, . . . , Zjq are i.i.d. a, 



>exp{-A'[log(a(R)/(2E))+(>a 4 ]}, 
the last inequality following from (A2) and the assumption that a — > 00. □ 

6.2. Normalized inverse-Gaussian process. We prove an analogue of Lemma 
6.1 in Ghosal et al. [9], pages 518-519, or Lemma A. 1 in Ghosal [7], pages 1278- 
1279. We provide an estimate of the probability of an L 1 -ball in W N under 
the N-IG distribution. For r > 0, let B(z ; r) := {z 6 M. N : \\z — Zq\\ < r} 
be the L 1 -closed ball centered at zo with radius r. 

Lemma 6.3. Let Z := (Z±, . . . , Zn) be distributed according to the N- 
IG distribution with parameters [a\, . . . , ajv)- Let zq := (210, ■ ■ ■ , ^tvo) G 
A"" 1 . For < e < 1, let U := (Z e B{z ; 2e), mini^x^- > e 2 /2). 
Assume that Ae b < atj < 1 /or every 1 < j < N and some constants 
A, b > 0. // mini<j<7v ^jo > f/iere exisi constants c, C > (depending 
only on A, b and m := X^j=i a jJ such that, for e < 1/N and N — > 00 as 
£ — >■ 0, P(t/) > Cexp{— cA r max{log(l/e), log(l/(mini<j<7v ^jo — 

Proof. As in the proof of Lemma 6.1 in Ghosal et al. [9], pages 518- 
519, we can assume that z^o > 1/N. If \zj — zjq\ < e 2 for every j = 
1, . . . , N — 1, then ||z - z ||i < 2e and ztv > e 2 > e 2 /2. Therefore, C/ is 
implied by V := (\Zj - z j0 \ < e 2 , Zj > s 2 /2, j = 1, . . . , N - 1). For 
ij- := ((^0 - e 2 ) V (e 2 /2)) and Uj := ((z j0 + e 2 ) A l), j = 1, . . . , N — 1, 

F ( V ) = Ih " " " Il'jZT f( Zl ' • • • ' ZN ~^ dZl " ' dzAr -!' Where / = Tlr=l h r> 

with the /i r 's as in (4.6). Then, 



> exp < — cN max < log(l/e), log I 1/ ( min zjq 

I { V 1 <j< N 

where h± is bounded below using the constraint ay > Ae b , while h^ > 1 
because every Zj < 1, j = 1, . . . , N. To bound below h2, first note that 
K_ N / 2 {-) = K N / 2 {-) (see 9.6.6 in Abramowitz and Stegun [1], page 375). 
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Since, for e small enough, 

/ \ -1/2 

{A N {z u . . . , ztv-i)) 1/2 < m 1 ' 2 min z j0 -e) « (N/2 + 1) 1/2 , 

\i<j'<^v y 

the approximation h 2 ~ 2 Ar / 2 ~ 1 r(iV/2)(^7v(zi, ztv.i))"^ 4 holds (tW- 
dem, formula 9.6.9). By Stirling's formula, h 2 > e _Ar / 2 m _Ar / 4 (mini<j<jv ^o - 
e)^/ 4 . Consequently, h 2 x h 3 > (em)~ 7V / 2 (mini< i < A r z j0 - e) N / 2 . □ 

7. Proof of Theorem 4.1. We prove the result for the L 1 -distance 
using Theorem 2.1 of Ghosal and van der Vaart [10], page 1239. Next, we 
deal with the L p -metrics, for 2 < p < oo, appealing to Theorem 3.1. For the 
cases where 1 < p < 2, the result follows from the interpolation inequality 
||/f> - /oil? < max{||/ F>(T - /o||i, \\f F ,cr - foh} < n -1 / 2 (lag nf , with a 
suitable -0 > 0. 

• L 1 -distance. We show that conditions (2.8) and (2.9) in Theorem 2.1 of 
Ghosal and van der Vaart [10] , page 1239, are satisfied for e n = n _1//2 (log n) x , 
with a suitable x, and e n = n" 1 / 2 (logn) T+ ( T ~ 1 / 2)1 (°'°°>^, with r as in (4.5). 
Since x > T i the posterior rate is e n := (e n V e n ) = e n . Given % G (0, 1/5), 
for constants E, F, L > to be suitably chosen, let s n = £'(log(l/?7 n )) _2 ' r / 7 , 
S n = exp{F(log(l/77 n )) 2r } and a n = L(log(l/?7 n )) 2 ' r / <5 . Using the same sieve 
set & n as in Theorem 4.1 of Scricciolo [30], pages 285-288, with S n play- 
ing the role of t n , in virtue of Lemma A. 10 and the fact that, for r > 1, 
(an/snY^- 1 ) > \og(l/ Vn ), 

(n \ I (o,i](r)+rI(i, ao )(r)/(r-l) / , \ 1+I(p, i] (r)/r 

logD( Vn , d a ) < - x log- 

Taking r/ n = e n , we have \ogD(e n , & n , da) < ?i£ 2 - As for condition (2.9), 
by assumption (ii) and the fact that 2r > 1, for appropriate choices of 
E, F, L as functions of the constant c 2 arising from the small ball probability, 
(n ® G)(J^) < e- ds " 7 + + e-^/r? 2 < exp {-(c 2 + 4)ne 2 } because, by 
Markov's inequality and the independence of {Wj}j>i and {Zj}j>\, H({F : 
F([-a, af) > V 2 /16}) < (16/ ?? 2 ) E[£°° =1 ^ W]<=(^)] < a([-a, af)/rf < 

e -ba s jtf 

• L p -metrics, 2 < p < oo. We have 5 n = e^ne^ )( 1_1 /p)/ 2 . Choose s n = 
.E(ne 2 ) -1//7 , with E < [(c 2 + 4)/d]~ 1 / 7 , c 2 > being the constant arising 
from condition (3.1) and d > (in this occurrence) the constant appearing 
in (A3). Since fo = fF ,a , we have ||/o|| p < 00 and, for n large enough, 
I |/o * sinc 2 -j n -f \\ p = O(5 n ) because / € & n . 
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• Small ball probability. Next, we show that, for < e < [(1/4) A (cto/2)], 
there exist constants ci, c 2 > so that 

(n® G)(B KL (f ; e 2 )) > Cl exp{-c 2 (log(l/e)) 2 [ T+ ^ 1 / 2 ) 1 (^)^}. 

A preliminary remark is in order. The case where w = oo corresponds to 
Fq having compact support, i.e., Fq([— ao, ao]) = 1 for some < ao < oo. 

Let a £ := aQ {oo} ^(cQ 1 log(l/ 'e)) 1 ^ and let Fq be the re-normalized restric- 
tion of Fq to [— a £ , a £ ]. By Lemma A. 3 of Ghosal and van der Vaart [10], 
page 1261, and (4.1), ||/f* )CTo — /o||i i$ £• We show that there exists a dis- 
crete probability measure Fq on [— a e , a £ ], with at most 

(7.1) iV<(log^ 

support points, such that ||/f*,<t ~~ fF£,cr \\ao ^ £ - The support points of 
Fq can be taken to be at least 2e-separated. We distinguish the case where 
< r < 1 from the case where r > 1. In the latter case, the assertion follows 
immediately from Lemma A. 6: in fact, a £ can be taken to be large enough 
to satisfy the requirement (a £ /o~o) > 1. If < r < 1, Lemma A. 6 cannot be 
directly applied because the requirement on (a £ /ao) may not be met. Yet, 
an argument similar to the one used in Lemma 2 of Ghosal and van der 
Vaart [12], page 705, can be adopted. Consider a partition of [— a e , a £ ] into 

k = [a; {oo}(ro) (4 1 " 7{oo}{ro))/ro ( 7o)- 1 (log(l/e)) 1 / r - 1+/ ( -)( ro )/ ro l subintervals 
Ji, Ik of equal length < I < 2cro(log(l/£))~( 1 ~ r )/ r and, possibly, a 
final interval Ik+i of length < lk+i < l- Let J be the total number 
of intervals in the partition, which can be either k or k + 1. Write Fq = 
Ylj=i Fq (Ij)FQj, where F *j denotes the re-normalized restriction of Fq to 
Ij. Then, (*) = £/=i F* (Ij)f F ^., ao (x) = £/ =1 F*(/ j )(F* j *K ao )(x), 

x € R. For every j = 1, . . . , J, by Lemma A. 6 (and Remark A. 2) applied 
to every f F * . j(70 , with (a/a) = (l/2)/o~o oc (log(l/e)) _ ( 1_r )/ r , there exists a 
discrete distribution Fqj, with at most Nj < log(l/e) support points, such 
that \\f F *., ao - f F ^J\oo < e. Defined F^ := £/ =1 F * (Ij)F 0>j , we have 

II/f*,<7 - /^.obIIoo < E^i^o&OII/ffo.au - /pSj.oJIqo ^ e, where Fq" has 
at most N <(JxNj) <kx log(l/e) < (log(l/e)) 1/r+/ (°. support 
points. Combining the result on the total number A?" of support points of 
Fq in the case where < r < 1 with the one in the case where r > 1, we 
obtain the bound in (7.1). Let q > be such that [|A| 9 ] < oo. For any 
v such that (1 + q)~ l < v < 1, by Holder's inequality, J f F , dA < (1 + 
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k M 9 /*?,^) dx) v < {(1 V 2^)[a«E K [\X\«} + f«l \9\*<U%(0)]} V < 
this implying that ||/f* jCTo — /i^,o- lli ~ £ 1 ~ v a £ q in virtue of Lemma A. 7. 

Next, we distinguish the case where the prior for the mixing distribution 
is a Pitman- Yor process with d = and c = a(R), Dirichlet process, 

from the case where the prior for the mixing distribution is a Pitman- Yor 
process with d > 0. 

• Dirichlet process. Represented Fq as ^2f = iPjdoj, with |#j — 6^ > 2e for 
all j ^ k, and set Uj := [#j — e, 0j + e], j = 1, . . . , N, for every probability 
measure F on R such that 

iv 

(7.2) ^|F(^-)-Pil<^ 



and every a > such that |cr — er | < e, we have || /f, o- — /i^,o- 111 ~ 
||F ff - F CT0 111 + e/(a A ao) + Ei=i \F(Uj) — Pj\ < e in virtue of Lemma A.8, 
Lemma A.9 and (7.2). Thus, \\f F , a - /i^ ;0 - lli ~ 6 and ^(/f^, fo) < 
WfF,a - /^ j0 -olli + ll/i^,<To - fFZ,a || 1 + H/f*,^ - /o||i < e 1 "^" 9 . In order to 
appeal to Theorem 5 of Wong and Shen [33], pages 357-358, we show that, 
for densities in Sjv, £ '■= {/f,o- : J2f=i l-^(^i) ~~ Pj\ — e > \ a ~ a o\ < £ } an d a 
suitably chosen g G (0, 1], we have M\ := / {(/o//f >CT )> e i/e} fo(fo/fF,a) e dA = 
0((l/e) 5 ), with < £ < k/iu. For every F satisfying (7.2), F([-a £ , aj) > 
1/2, thus, by symmetry and monotonicity of K, fp i(T (x) > f® e a K a (x — 
9)dF(0) > K a {\x\ + a £ )/2, i£R. By assumption (A ), K(a £ ) > exp{-ca£} 
for a £ large enough, so that 

/ J?[Tr \ dx ~ ^( 4a -) / M*) ^ < exp {gc(4a £ /a r} 
J\x\<a £ KZ{\x\ + a £ ) J\x\<a £ 

because \a — ctq\ < e < ao/2 and ||/o||oo < oo by (2.2). Also, 
Jh 



dx 



\x\>a £ K$(\x\ +a £ ) 

< [ K-e(4\x\)[K ao (\x\/2)+F ({6 : \9\ > \x\/2})]dx < oo, 

J\x\>a e 

where the last integral is finite for a suitable choice of g and in virtue of 
the tail condition (4.1) on Fo postulated in (i). Thus, the inclusion Sn,s 
Fkl(/o; cie l ~ v a £ q (log(l/e)) 2 ) holds. To apply Lemma A. 2 of Ghosal and 
van der Vaart [10], pages 1260-1261, note that, for each \6j\ < a £ , by condi- 
tion (4.2) prescribed in (ii), a(Uj) > ee~ ba,S ~ > e b for some constant b' > 
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because, when w < oo, we have < 5 < w by assumption. Thus, (3.1) is 
satisfied for i n = n _1 / 2 (log n) T . 

• Pitman-Yor process with d > 0. We need to modify the arguments to 
control ||/f,o- — fF',a- \U- To the aim, the stick-breaking representation of F 
is exploited. Let Fq = J2f=iPj^6j be the finite approximating distribution 
of Fq . By relabelling, we can assume that p\ > pi > ■ ■ ■ > pn > 0. Let 
M < N be the number of strictly positive mixing weights pj. For every 



a > 0, by Lemma A. 8 and the inequality 



! Wj < E M 



--M+1 



i=i 



■Pi I 



M 



(7.3) 



II/* 



/^, CT ||i<2^|^-p J | + 



2\\K\ 



M 



Let v\ := pi and : 
Vj G (0, 1) for every j 



Pi[ni=i(i - ^/or 1 for i 

1, M. By (4.3), 



2, . . . , M. Note that 



IW,- 



-Pi I < \Vj-v j \\[{l-V h )+v j 



h=l 



h=l 



h=l 



/i=i 



where the inequality | rj£li y/i-ITd ^1 ^ El=i 1 2/^ — ^ I ; vali d for complex 
numbers yi, . . . , and zi, . . . , Zj-\ of modulus at most 1, has been used. 
If, for < e < ffo/2, 

c) |cr - a \ < e, 



then H/f.o- - /i 



ll< 



K a - K ao || i + ELi I ^ ~ ^ I + E Jii Pi I z j 



6j\ < e by Lemma A. 9 and (7.3). 

Next, we show that, for = a £ (or i? e = a £ + 1, the latter case being 
considered if any support point 8j of Fq is equal to — a £ and/or a e ), the 
events in a) and 6) together imply that, for < e < [(1/4) A (cr /2)], we 
have F{\—B e , B £ ]) > 1/2. This inequality is used when checking that, for 
a suitably chosen g £ (0, 1], M 2 = 0((l/e)£), with < ^ < k/w, so that 
Theorem 5 of Wong and Shen [33], pages 357-358, can be invoked. By the 
event in 6), for e small enough, all the Z^s are in [—B e , B £ ]. Using this fact 
and the inequality Ejf=i l^i ~ Pj\ — Ej=i El=i l^i ~~ v h\, the event in a) 
implies that F([-B £ , B £ ] c ) < Y,f =x ELi 1^ ~ «h| < e < 1/2- 

We estimate the probabilities of the events in a) and b). In view of the 
independence of the random variables Wj's and Zj's, by Lemma 6.1 and 
Lemma 6.2, for (1 — p\) > (4s/N 2 ) (in case p\ does not satisfy the condition, 
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/i^,<r can be projected into a new density fF^,a which is within L 1 -distance 
e from /ir CT . This new density can be obtained by first changing the point 
mass p\ to p\ such that (1 — p\) > (Ae/N 2 ) and then equally distributing 
the exceeding mass among the other M — 1 points), 

( M j \ / M 

\i=l fc=l / \i=l 

> exp {- Cl M 2 log(l/(l - Pl ))} x exp{-M|log(a(R)/(2e)) + 6a*]} 
>exp{-c 2 M 2 log(l/e)}, 

because, by (7.1), M < N < (log(l/e)) 2T_1 , where r > 1, and, for w < oo, 
we have < 5 < by assumption, so that af < log(l/e). Thus, e n = 
n- 1 / 2 (logn) 2T - 1 / 2 . □ 

8. Proof of Theorem 4.3. The main difficulty in the proof rests in 
finding a finite mixing distribution, with a sufficiently restricted number of 
support points, such that the corresponding Gaussian mixture approximates 
the true density, in Kullback-Leibler divergence, with an exponentially small 
error. Such a mixing distribution may be found by matching the moments 
of an ad hoc constructed mixing density. The crux is the approximation of 
an analytic density with exponentially decaying Fourier transform by con- 
volving the Gaussian kernel with an operator, whose expression resembles 
a Taylor series expansion, with suitably calibrated coefficients and deriva- 
tives convolved with the sine kernel. Such a (not necessarily non-negative) 
function turns out to be a convex linear combination of iterated convolu- 
tions of the true density with the Gaussian kernel, which has the effect of 
reducing the bias. Once this function is modified to be a density with sub- 
Gaussian tails, the compactly supported version obtained by re-normalizing 
the restriction to a compact set is employed to find an approximating mixing 
distribution. 

We start by stating the auxiliary result on the approximation of analytic 
densities by convolutions with the Gaussian kernel. Let rrij := J^y 3 ' 4>{y) dy 
denote the moment of order j of a standard normal. For every j £ 1ST, we 
define two collections of numbers Cj and dj as follows. For j = 1, we set 
c\ = d\ = 0. For j = 2, we set c 2 = and c?2 = mil%. For every integer 
j>3, 



E 



m k mi _ (-iym,j 

km ' J -~ i\ +Cj - 



j=k+l 

k>l,l>l 
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Since moments of all odd orders are null for the Gaussian kernel, 

k>i,i>i 

Note that the numbers Cj and dj only depend on the moments of 4>. For 
a > and a function / € C°°(M), we define the transform 

oo 

TM) :=/-^^(/ 0) * sinc -)- 
i=i 

Lemma 8.1. For a > small enough and every probability density f € 
C W (R) with characteristic function satisfying (2.1) for some constants p, r > 
and < L < oo, 

(8.1) \\TM) * 4>* - /Hoc < e"^) r /2 1{oo}(5/) . 
Furi/iermore, 

(8.2) T CT (/) = 3/ - 3(/ *4> ff ) + f*<f>a*<f>a + 0{e~^^ / 2 \ {oo] {S f )). 
Consequently, fT a (f)d\ = 1 + o( e -(/'/ <T ) r / 2 l {oo} (5 / )). 

Proof. By definition of T CT (/), Taylor's formula and the assumption that 
/ G C""(M), for every x E R, 

(T a (f)*cf> a -f)(x) 

r X 00 
= / fix-y) -fix) - YVo-^/^ *sinc CT )(x - ?/) (j><r(y)dy 

L i=1 J 

= X] ( ( ~ 1 ^ mi ^/ (i) (x) - djV(/^ * sinc CT *^)(x) 
j=l V ^ 

= £ ( ^^V ^ 1 - / 0) * sine, - Cj V(/^ * sin C(T , 

j=i V J- / 

where, in the last line, the definition of the dj's is used. For every j G N, 
consider the decomposition 

(fU)-fU)^ sinc<j ^ a ^ x ) 

1 



2tt 



i-ity e - Ux fit)dt 

\t\>l/a 

+ [ i-itye-^fit^^dt-if^^smc^^ix) 
--■ Tiij, cr, x) +T 2 (j, cr, x). 
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By the Cauchy-Schwarz inequality and the assumption that / satisfies (2.1), 
T\(j, a, x) < a~ie~ <yp l' 7 Y / 2 1^ 00 ^{S f) for a small enough. Thus, 

E ^p^-^m °, *) < e-^ r / H{oo}{Sf) 

because Y^jLi m j/j ] - < 00 ■ Next, we show that Ejt=i[( — l) Jm j (T "'^2(i 5 x )/j 1 -- 
Cj&^(f * sinc a *4> <J )(x)] = identically. Algebra leads to T 2 {j, c ; = 
" ££Li m 2fe a 2fe (/^'+ 2fe ) * sine, *^)(x)/(2fc)!. Hence, 

V ., J v J T 2 (j, a, x) 



oo 

= - E if, (E ^g^'a** 4 "' * •*)<*) J 

oo / oo \ 

=e - e i?Mr 2 * (/<2,,,sinc '* )W 

s=2 V 2k+2l=2s K >' y >' / 

oo 

= E c ^ cj2s (/ (2s) * sinc -*^)( 3; ) 

by definition of the numbers c^s- The proof of (8.1) is thus complete. 

Next, we prove (8.2). Because T^j, a, x) < a~ j e'^^ / 2 l {oo} {S f ) for 
a small enough, T a (f) = f~ £°°=i + 0(e-W°) r /*l {oo} (S f )). By 

definition of the dj's and taking into account that E^i( — l) J ' 77i i° J '/ /j' = 
/ * 4> a - /, we have / - c^/W = /-(/*&,-/)- i = 
2/ - / * ^ - E£Li c^'M where 

oo oo / oo \ 

E •v"/*' = - E §p (E ^- 2t f' 2H2t> ) 

oo 

= -E^ 2j (/ (2j) *^-/ (2j) ) 

= -(/*^-/)*0«r + (/*0«T-/) 
= ~f 4>v + 2(/ * <M - /• 

Relationship (8.2) follows. To bound above J T a (f) dA, note that the coeffi- 
cients in (8.2) sum up to 1. Also, for a small enough, 

iKi, a, x)l {z .. Tl{j ,^ z)m (x)dx = o{o-ie-W°r' 2 ) 
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because limo-^o T\(j, a, x) = identically so that A({z : Ti(j, a, z) / 0}) = 
o(l). □ 

Remark 8.1. The technique developed in Lemma 8.1, properly com- 
bined with the arguments of Lemma 3.4 of de Jonge and van Zanten [4], 
pages 3317-3318, can be exploited to find an approximation of a density 
/ contained in a Sobolev space of order /?, using the fact that the Fourier 
transform satisfies f R (l + |i| 2 )^|/(i)| 2 dt < oo. We do not report on this 
aspect in the present article. 

Let /o be a probability density satisfying (a). Given a, (3 G (0, 1) and 
a > 0, we define the sets 

B a :={x£R: f (x) > Ba~ M e~ Cl{1/a)n) }, with < c x < p r ° /2, 

G a :={xeR: T a (f )(x) > af (x)}, 

U a :={xeR: \f^\x)\ < D(j, a), j G N, Q(x) < a'?}, 

where D(j, a) := a ~ j e C2j ^/^ r ° 2~^ r ° p~ j [T((2j + l)/r )] 1/2 . The function 
Taifo) is modified to be non-negative by setting it equal to a multiple of /o 
when it is below it. Thus, we define g a := T a (fo)lG a + a/olGc . 

Lemma 8.2. Suppose fo is a probability density satisfying (a)-(b) for 
q > 2, < r < q/(3q - 2) and r (q - l)/q < (3 < 1 - r /2. For fixed 
C £ (0, 1), let a := C 2 /2, where is the constant defined in Lemma A. 11. 

Then, for a > small enough, j g a dA > a and j g a d\ = 1 + 0(e~ Cs( - 1 ^ a ^ n) ) 
for a suitable constant C3 > 0. 

Proof. By definition, g a > a/o(lG CT + 1g=) = ctfo so that J g a dA > a. 
Because g a can be written as g a = T a (f ) + [af - T a (f )] Iqc, by (8.2), 

J g a d\ = l + o{e-^/^°' 2 l {oo} {S h )) + J[af -TM)]1(% dA 

since, for a suitable constant c > 0, 

(8.3) j [a/ - r CT (/ )]l G c dA = O^ 1 ^" ). 

This statement is proved using both the assumption that log /o is locally 
Lipschitz continuous and the monotonicity assumption (6). For a fixed con- 
stant k > and a fixed x G R, let jV X)£r := {y G R : |y - x| < fco- 1 ^ / 2 }. 
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Let a be small enough so that ko 1 ~ r °/ 2 < 5. For every y G N Xt<J , we have 
fo(y) < fo(x) ex.p{Q(x)\y - x\}. Consequently, 

(fo*<t>a)(x) < fo(%) / exp{Q(x)\y - x\}(j) a (x-y)dy+((fol N c J*cj) a )(x). 
For x £ U a and a suitable constant C > 0, 

/ exp{Q(x)\y - x\}(p a (x - y)dy < [1 + CQ(x)\y - x\]4> a (x - y) dy 

< l + CaQ(x) = l + 0(a H ), 

with H := (1 - 0) > 0. Also, ((/oLvc J * 0<x) < e-^W 3 f or a suitable 
constant fci > 0. Therefore, 

(8.4) VxGc/ CT , (f *^ a )(x)<f (x)[l + O(a H )}+C 1 e- k ^ r °, 
the complement set U£ having exponentially small probability. In fact, 
P„(E£) < P Q (3j G N : |/ o a) (X)| > D(j, a)) + P (Q(X) > ^) =: P x + P 2 . 
By (2.1), 

(8.5) V j G N, |/ (i) (x)| < 2-J'/n)p-i[r((2j + l)/^)] 1 / 2 =: B (j), 
which, together with Markov's inequality, implies that 

a < Y,m> ^i^Eoii/o 00 ^)!] < E^ e " C2J(1/CT)ro s e - fe2(1/CTr °- 
j>i j>i 

Also, P 2 < e-a/^^^'-^ EoIexplIQ^)]^^" 1 )}] < e -(V*r° . Hence, P (E£) < 
e -fc 3 (i/<x) r «_ Next; we show that B a nU a C G CT . Let C G (0, 1) be fixed. For 
cr small enough and every x G B a n L^, by (8.2), (8.4) and Lemma A. 11, 

Wo) = 3/o - 3(/ * + fo*<t>**<t>* + O(e-^ r °/ 2 l {oo] (S f0 )) 
> {Cl - 30(a H ) - 3C ie - k ^ r ° - C 2 a M )f > C ? 2 / /2, 

because, on B a , we have eT^I^ I 2 / f = 0(a M ). Taking a = C%/2, we 
have T CT (/o) > a and the inclusion B a n U a C G CT holds. To prove (8.3), 
it suffices to show that f[af - r CT (/o)]l B c UC/ c dA < e -c(W>. This can be 
shown using the bounds P (E/J) < e^ 1 /^ , P (^) < ( o --^ e - Cl ( 1 /^) r0 )7 
for every 7 G (0, 1), and the fact that, up to O(e - ^°/ (7 )'' / 2 l{ t3O }(5'/ )), the 
transform T a (fo) is a linear combination of /o, /o * CT an d /o * 0<r * We 
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begin by showing that J uc (fo * ^o-)dA < e-C 1 / ") 1 " / 2 . For random variables 

Y ~ /o and Z ~ N(0, 1), * 

/ (/o*^) dA < P(Y +( jZ G [£, \Z\ < a~ r °/ 2 )+F(\Z\ > a~ r °/ 2 ) =: T 1+ T 2 , 

where T 2 < e - ^ 1 ^' ' 2 and Ti = because, for every x £ U£, there exists 
(at least) one j G N such that |/q (a;)| > -D(j, ff) -)• oo as a -> 0. On the 
other hand, for every j G N, for \Z\ < o~~ r °/ 2 , using the bound B$(j) i n (8-5) 
with the greatest value r > 2 that guarantees YlT=i \fo +k \Y)\/k\ < oo, 

\f^(Y + aZ)\ < |#(F)| + aW2^ l/o J+fc |(y)l < 1 + a W2 ^ c < 00> 

fc=i 

which implies that |/q | is bounded away from oo, thus contradicting the 
previous statement. By the result just shown, f Bc (/o * (f) a ) dA < J B c nU (fo * 
tj> ) dA + e -( 1 /O r °/2 5 where; for £ > 1; 

I (/o * </v) dA < P(Y + <rZ g b% n f/ CT , |Z| < a~ r °/ 2 , Fe%n t/ CT ) 

+ P(Y G E£) + P(y G B^) + P(|Z| > a~ r °/ 2 ) 

< p -fc4(l/<x) r 

The probability of the first event can be shown to be equal to similarly 
to Kruijer et al. [20], page 1251: on the one hand, since Y + aZ G B% and 

Y G B^, we have |log/o(Y+<rZ)-log/ (Y)| > (l-f- ro )<7 _ro oo. On the 
other hand, since Y G U a and \Z\ < a~ r °/ 2 , | log f (Y + crZ) - log/ Q0| < 
Q(Y)a\Z\ < cr( 1 -' r °/ 2 )- /3 ->■ 0, in contradiction with the previous statement. 
Analogously, it can be shown that J B c UUC (fo * 4><j * ^o-)dA < e -c ( 1 /°T ) 
which completes the proof. □ 

Next, a finite Gaussian mixture is constructed from g a such that it ap- 
proximates the true density, in Kullback-Leibler divergence, with an expo- 
nentially small error. 

Lemma 8.3. Suppose fo is a probability density satisfying (a) -(c) for 
q, ro and (3 as in Lemma 8.2. Then, for a > small enough, there exists a 
finite Gaussian mixture m a having at most N a = 0((a a / a) 2 ) support points 
in [—Oo-, Oo-], with a c = 0(a~ T °/ 2 ), such that, for constants S, C5 > 0, 

(8.6) max{KL(/ ; rn a ), E [(log(/ /m (T )) 2 ]} < a~ s e~ c ^ r °. 
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Proof. We give the proof only for the bound on KL(/o; m a ), which is 
decomposed into the sum of three integrals, see (8.8) below. Fix £ G (0, 1) 
and let a := Cj/2. Set C 9a := J g a dA, by Lemma 8.2, C g<T = l+Ae~ c ^ l /^ ro , 
where A > is a suitable constant. Defined the probability density h a := 

9a/C 9a , 

V<7<t c , ^ > T _ i -_ TO ^ > _____ /0| 

because g CT > a/o and Lemma A. 11 applies. Furthermore, \h a * <p a — / | < 
C- 1 !^ * K - M + \C~} - l|/o < \g a * CT - /o| + e- c 3(V-)^ /„. Lemma 8.1 
and J[a/ - T CT (/ )]l Gg dA < /[a/o - T CT (/ )]l B c ucr c dA < e - c (V-) r ° imp i y 

\9* * ^ - /o| < |T CT (/ ) * fa - f \ + | [(a/o - T CT (/ ))l G c] * CT | 

< e -(po/^/2 1{oo}(5/o) + CT -i e -c ( i/.ro_ 

Therefore, 

(8.7) ||/ iff *^-/o||oo<e- C4 ( 1 / CT ) r °. 
Now, consider 

KL(/ ; h <J *<t>*)= \ I + I )(fo log r -^— ) dA =: h + J 2 . 

\JB a C\Uv JBZUUZJ V <p„ J 



For ci < c 4 , by (8.7), 



{ < suP^gB^n^ l/o(g) ~ {K * <Pq){x)\ r 

1 ~ mf x£Ba f (x) - sup xeBanUa \f Q (x) - (ha * (f>a){x)\ J Ba nU* 

< e < e -(c4-ci)(l/«r)n) i 



e -ci(l/ff)D (^o-M _ £) e -(c4-ci)(l/ff)D) 

Because / BJfu0 , /„ dA < .-TMe^W^o + e -k s (i/<^ ? 

I 2 < ( ff -7^ e -7Pi(V ff )n) +e -fc»(V^)log((l + ^ e -«8(V^)/( a C f )), 
where the logarithmic term is positive because < aC^ < 1. Thus, 
KL(/ ; ha * CT ) < CT - 7M e- min{(c4 - Cl) ' 7Cl ' fc3}(1/<T)ro . 

Next, let C ha := f^h a d\ and define h a := ^ CT l[-a CT ,a CT ]/Cft CT to be the 
re-normalized restriction of h a to [— a a , a-a}- By Lemma A. 6, there exists a 
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discrete distribution F on [—a a , a a ], with at most N a = 0((a a /a) 2 ) support 
points, such that \\h a *(/)& — F * 0o-||oo ^ a~ l e~ Na . Set rh a := Ch a {F * 4> a ), 

\h a *4> a - m a \ < \\h a * (j) a — F * <t> a \\oc + {K\-a„ ,a a ] c ) * $° 
<(lV'|(ll ff l haii4 )*i. 

Because /o satisfies (c), for fixed ij £ (0, 1) and a small enough, we have 

(M[_a CT ,a CT ]0 * 4> a < e- (po/<T)ro/2 l {o c}(S/ ) + <j>{m*)Q-[a a ,a„Y * in virtue 
of Lemma 8.1 and Lemma 14 in Maugis and Michel [23], page 47. Thus, for 
a suitable constant c" > 0, 



+ e-^r°/ 2 l {oo} (5 /0 ) + ^(r ? a CT )<e- c "(V^. 



Now, define t := fh a + D a (/) a , with £> CT := f7 -(«-i) e -£(VO ro , E > 1, and the 
finite Gaussian mixture with density 

t m a + D a (f) a 



JtdX C h „ + D a • 
Write 

KL(/ ; m a ) = KL(/ ; h CT *fa)+ f fo log dA + / / log — dA 

J t J m a 

(8.8) =: J1 + J2 + J3. 

• Contra/ 0/ Ji. It has been shown that Ji < cr^V min{( C4 - C i),7ci,fc3}(i/<T) r o . 

• Control of J 2 . Write J 2 = (J^ + J B c)fo log((/i CT *4>a)/t) dA =: J21 + J22, 
where, for ci < (c" A c), 

J 21 < / /o^f^dA 

a -R e -{c"hc){l/aY0 r 

~ SO— M" e -Cl(l/(r) r _ g-c"(l/cr)''0 ' ^° ^ 



< _Jlf-ii_-[( C "Ac)-Cl](l/«T) p 



because \h a *fa-t\ < \h (7 *cf) (7 -m\+D a 4> a < a- R eT^ '^)( 1 M r ° and, over B a , 

K*fa>fo> Bo- M e-^ l ^ r ° so that t > m a > K * fa - \h a * fa - m a \ > 
a -M e - cl (l/*yo _ e -c"(l/*yo_ Becauge t < oo an d f > D a fa, 

J22 < \og{a/D a ) [ f dA + -L / x 2 / (x) dx 
J B c 2(7^ J B c 

< cr -(7iW+r- ) e -7ci(l/<r) r o + ^ ( 7 Jlf+2) e -7ci(l/a) r o 

< cr -[ 7 M+(2Vro)] e - 7Cl (l/<7)''o_ 
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• Control of J3. Noting that t/m a = Ch^ + D a < 1 + D a , we have 
J 3 < log(l + D a ) <D a = a ~(R-i) e -£W°r° . 

Combining partial results, for c\ < min{c4, c" , c}, we have KL(/o; m a ) < 
<7~ 5 'e~ C5 ^ 1 / "^ , where S and C5 are suitable positive constants. The same 
reasoning applies to bound Eo[(log(/o/m CT )) 2 ] and (8.6) follows. □ 

Proof of Theorem 4.3. As in Theorem 4.1, we prove the result for the 
i^-distance. Next, we deal with the L p -metrics, for 2 < p < 00. The cases 
where 1 < p < 2 are covered by interpolation. 

• L 1 -distance. The entropy condition (2.8) and the remaining mass condi- 
tion (2.9) in Theorem 2.1 of Ghosal and van der Vaart [10], page 1239, can be 
shown to be satisfied as in Theorem 4.1 and Theorem 4.2 of Scricciolo [30], 
pages 285-289, computing the remaining mass as in Theorem 4.1. 

• LP -metrics, 2 < p < 00. Same proof as in Theorem 4.1. 

• Small ball probability. We show that, for constants cx, ci > 0, (II <g> 
G)(B KL (f ; el)) > ci exp{-c 2 ne ;2 }, with e n = n^ 1 / 2 (logn) c , for a suit- 
able ? > independent of r^. By Lemma 8.3, for a small enough, there 
exists a finite Gaussian mixture m a , with N a < (a a /a) 2 support points 

where a a = 0{a~ r °/ 2 ), such that (8.6) holds. Let 
Pi, . . . , pN a denote the mixing weights of m a . The inequality in (8.6) holds 
for any Gaussian mixture m a i , with a' € [a, a + e -rfl ( 1//<T ) r °], having sup- 
port points 9[, 9' Na such that Y^f=l Wj ~ °j\ ^ e^ 1 /^ and mixing 
weights p\, . . . , p' Na such that Ylf=i \Pj ~ Pj\ — e - ^ 3 ^ 1 / "^ for suitable con- 
stants d u da, d 3 > 0. Let B a := {f > with Cx := B'a~ s ' e~ c '^ r ° , 
where S' := (S — 2)/7, with (1/2) < 7 < 1 and d < 3cs, the constants S and 
C5 being those appearing in (8.6), 7 being arbitrarily fixed (note that this 7 
is different from the one appearing in (A3)). For any probability measure F 
on R and a> G [a, a + e^ 1 ^ ], 

(8.9) KL(/ ; < a- s e~^° + (7 + Q (fo*Kj£) ■ 

We provide an upper bound on the second integral. For any F such that 
F([-a a , a a }) > 1/2, we have f F ,A x ) £ o-" 1 exp{-{x 2 + a 2 a )/(a') 2 } for all 
x G R. From Lemma 8.3, Hm^'Hoo < o" -1 . Also, fg c (x/a) 2 fo(x) dx < o~ 2 Q 
and Jg c /odA < £j. Therefore, for a suitable constant c" > 0, 

/ /ologf^dA< / ^ 2 Mx)dx+(^) 2 I f 6\<e-*W*r°. 
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• Dirichlet process. Clearly, 

/ / log(m CT /// FiO ./)dA < / /o(||m CT / - / FjCT /|| 00 // Fi(T /)dA. 
J H„ J So- 

Using Lemma 5 of Ghosal and van der Vaart [12], page 711, we get \\m a t — 
If, <j' || oo KU 3 )+o- 1 E.f=i \F(Uj)-Pj\, where U Q , . . . , U Na 



is a partition of R, with f/o := (Uj^i ^j) c ano ^ 3 % f° r i = 1> 
Since 2a a /N a = 2a 2 /a a > a -S" e -c"{i/ay» for gome s „ &nd ^ the support 

points of m CT ' can be taken to be at least cr -3 ^" 2 ^ -3 ^ 1 / ")' -separated. If 
not, m a i can be projected onto a mixture m with <7~ 3 ( 5 ~ 2 )e _3c5 ( 1 / <J ) r °- 
separated points, such that ||m CT / — m^., ||oo < a~^ 3S ~^e~ 3c5 ^ 1 ^ a ^ r ° . Thus, we 
can find disjoint intervals U±, . . . , U^ a so that Uj 3 6j and 



a 



-3(S-2) e -3c 6 (l/a)-o < A([/ . ) < 2(T _3(5-2) e -3 C5 (l/ CT )-o ^ j = 1, ■ ■ ■ , N a . 



Let F be such that 

(8.10) \ F Vi) ~Pi\ ^ a-^ s -^e-^ l /^\ 

i=i 

Then, \\m a , - / FX |U < a'^-Ve' 3 ^ 1 /^ and, on B a , f F ^ > m a , - 
a -(3S-4) e -3c 5 (i/*y > Cct _ Therefore5 f Q l g(m^ /f F ,a>) dA < a-Se-KiiM* . 

Note that for F satisfying (8.10), F([— a a , a a ]) > 1/2. Combining partial re- 
suits, max{KL(/ ; f Fj(T >), E[(log(/ //F, CT 0) 2 ]} < a^e"^ 1 /^ . T o apply 
Lemma A. 2 of Ghosal and van der Vaart [10], pages 1260-1261, to estimate 
the prior probability of {F : Y^f=i \F(Uj)-p,\ < -(35-5) e -3 C5 (i/<r)''o ^ note 
that, since < 5 < 2, a{Uj) > \{Uj) mf m < aa a'(8) > a -HS-2) e -(3c 5+ b)(i/ayo _ 
Also, AT (T(J -(3S-5) e -3 C 5(i/<T)'-o < i XhuS; 

(n®G)(s KL (/o; ^e"^ 1 /^ )) 

^PrQ^ + e-^ro]) 

X Pr (I F : £ \FiUj) - Pj \ < a -{3S-S) ^(l/^ 

> exp{-d/a 2 - 2d 1 (l/a) r ° - c 7 N a (l/a) ro } > exp {-c' N a (l/a) ro }, 

for a suitable constant d > 0. Taking a = (logn) _1 / r °, tq = 1/3 and S" = 
8/3, we find (II ® G)(B KL (f ; e 2 )) > exp{-c 2 ne 2 }, for e 2 = n" 1 /2(i g n )4. 
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Pitman- Yor process with d > 0. It is enough to note that 

M , M 

\m„ 



j M M 

V - ||oo < - I W i - Pi I + ^2 



i=i i=i 

and then proceed estimating the probabilities in a) and b) as in Theorem 4.1. 
Again, the rate is e n = n _1//2 (log ra) 1 , for a suitable constant i > indepen- 
dent of ro- 

• NI-G process. Same proof as for the Dirichlet process. □ 

APPENDIX: AUXILIARY RESULTS 

This Appendix reports some auxiliary results. Proofs that are an adapta- 
tion of those of results known in the literature are omitted. 

The following proposition establishes the analyticity of EPD's with shape 
parameter that is an even integer. 

Proposition A.l. For 6 e R ; a > and m G N, let fe,cr,2m be the 
characteristic function of an EPD(#, a, 2m). Then, fe,a,2m(t) < Be~ ct2 , 
i 6 i, where B, c > are constants depending only on a and m. The 
corresponding EPD is analytic. 

Proof. The assertion trivially holds for m = 1, which corresponds to 
the case of a Gaussian distribution. For m > 2, let ^k,m := (2& + l)/(2m). 
From (2.4), 

f m v 7 ^ ^ r(tt fc , m ) [-(^(2m)V( 2 '")/2)2]fe 

/e - 2m(t) -rWM^ %M x ibi ' 4 

cf. (6) in Pogany [28], page 50. Applying Gauss' multiplication formula, see, 
e.g., Abramowitz and Stegun [1], page 256, 



fd,a, 2m(t) 



r(l/(2m)) 

r(^ fcim )[-(at(2m) 1 /(2m) /2) 2 ]fc/A . ! 



oo 

X 



2 (27r)-(^-i)/2m^,-- 1 /2) n^r^^+i/m) 



( 27r )W2 ^ 



V^r(l/(2m)) 



fc=0 



i=i 



m 2m 



-i 



-(CTi(2m) 1 /(2m)/ 2 )2]fe 



x 



fc! 
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because m k < 2 k < 1 for all k = 0, 1, . . .. Next, an upper bound on m , 
independent of k, is provided. To the aim, note that the Gamma function 
is continuous on (0, oo) and attains the minimum at a point z* £ (1, 2). 
Therefore, for every m > 2, we have T k ^ m < T k *^ m for all k = 0, 1, . . ., 
where k* is the smallest positive integer such that 1 < [k* + j + 1/2] /m < 2, 
j = 1, . . . , m — 1. Thus, 



V2r(l/(2m)) 

Hence, /g CT 2m(0 ^5 e c l* I for large |t|, and, in virtue of Theorem 11.7.1 in 
Kawata [19], page 439, the corresponding distribution function is analytic. 

□ 

In the next lemma, the sine kernel is shown to have bounded quadratic 
variation. By definition, a function h is of bounded p-variation on M, p > 1 
real, if v p (h) := sup{(£]£ =1 \K x k) ~ h(x k _i)\ p ) llp : -oo < x < ... < 
x n < oo, n G N} is finite. 

Lemma A. 4. The function x \— > sinc(x) has bounded quadratic variation. 

Proof. It is shown that V2(sinc) < oo. For every n £ N, the sum 
^^ =1 [sinc(xfc)— sinc(xfc-i)] 2 is maximum for x k = (2A;+1)tt/2, k = 1, . . . , n. 
Splitting the sum into two parts, 



[sinc(x fc ) - sinc(2; fc _i)] 2 = \ ^ 

\<k=2j<n l<2j<n 



8.7 



(4j + l)(4j - 1) 



and 



^2 [smc(x fc ) - sinc(2; fc _i)] 2 = — ^ 

l<fc=2j+l<n l<2j+l<n 



4(2j + 1) 



m _(4j + 3)(4j + l) 

Thus, ^(sinc) < oo as a consequence of YlTLiJ 2 < 00 • ^ 

The following lemma provides an upper bound on the L p -norm, 1 < p < 2, 
in terms of the product of the L°°-norm and any L 9 -norm, q > 1. The proof 
is similar to that of statement (6) of Lemma 4 by Nguyen [25] , pages 18 and 
24. 



34 



C. SCRICCIOLO 



Lemma A. 5. Let f,g£ L°°(R) be probability densities with Ef[\X\ v ] < 
oo and E g [\X\ u ] < oo for some real u > 0. For every 1 < p < 2 and t > 
such that pt > 1 , 

||/-^<(«" 1 + «) 

x {s-V^Vs/^H/ _ 5 p|/ _ 5 ||b-i)A( E/ [| X |«] + E^XH^/Ci+H 

where s _1 := 1 — i~ . 

Proof. For every > 0, by Holder's inequality, |/(^) _ ^(x) ^ dx < 

(2R) 1 / s \\f-g\\ p pt .Aho, f lxl>R \f(x)- g (x)\Pdx<R- u if-g\\^ 1 (E f [\X\"} + 
E g [\Xn). Thus, \\f-gf p < inf fi> o{(2i?) 1 / s ||/-<7||^+ J R- u ||/-5l|So~ 1 (E / [|Xr] + 
E 9 [|X| n ])}. The inequality in the assertion follows from vam. x> Q{Ax a +Bx~^) = 
(q + /3)[(^//3) /3 (-B/a) Q ] 1/(Q+/3) for every A, B,a,fi>0. □ 

The following lemma provides an upper bound on the number of com- 
ponents of a mixture, whose kernel density has Fourier transform satisfying 
(2.1), which uniformly approximates a given compactly supported mixture 
with the same kernel. 

Lemma A. 6. Let K be a probability density with characteristic function 
satisfying (2.1) for some constants p, r > and < L < oo. Let < e < 1, 
< a < oo and a > be given. For any probability measure F on [—a, a], 
there exists a discrete probability measure F' on [—a, a], with at most 

N < max{log(l/e), {a/a)} , if S K <oo, 

and 

' log(l/e), if < r < 1 

and pa/a = 0{{\og{l/e))^ r ^ r ), 

N < < 

~ I log(l/e), if r = \ and a /{pa) < e"\ 

max {log(l/e), {a/a) r ^ r ~^} , if r > 1 and a/{pa)>e~ 1 , 

if Sr = co, support points, such that \\F * K a — F' * i^o- ||oo < e/a. 

Proof. By Lemma A.l of Ghosal and van der Vaart [10], page 1260, 
there exists a discrete probability measure F' on [—a, a], with at most N + 1 
support points, ./V being a positive integer to be suitably chosen later on, 
such that it matches the (finite) moments of F up to the order N, 

(A.ll) E F /[9 J ] := / e j dF'{6) = I 9 j dF(e) =: E F [e j ], j = 1, N. 
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By the moment matching condition (A. 11), 



(A.12) \F(t)-F'(t)\ < 



a \te\ N 



mm 



m 

N + r 



, 2 d(F+F')(8), t e 



where the inequality holds because F and F' have finite absolute moments 
of any order, see, e.g., inequality (26.5) in Billingsley [2], page 343. By (2.1), 
L |iT((7i)| dt < oo, hence F*K a and F'*K a can be recovered using the inver- 
sion formula. By (A.12), \\F*K a -F'*K (T \\ O0 < 2a N /(ttN\) J R \t\ N \K(at)\dt. 
Next, we distinguish the case where Sk < oo from the case where Sk = oo. 
If Sk < oo, by the assumption that K satisfies (2.1), 



\F * K a - F' * KJ^ < 



N 



2 a 



t\ N \K(at)\dt 



\t\<S K /a 



<-[L + C(p, r)/V] 



a 



aN J ~ a 



for N > max (log(l/e), (ae 2 SK/&)}- If Sk = oo, by the Cauchy-Schwarz 
inequality, 



, „ 2a N (2kL 
F * K a - F' * KJoo < - 



< 



tt N\ \ a 



1 



1/2 



^2N e -2(pa\t\r dt Y /2 



a \2 l / r pa 



N 



[T{(2N + l)/r)] 1 l 2 
T(N + 1) 



Using the formula T(az + b) ~ (27r) 1 / 2 e az ( az ^ az + b 1 / 2 (% _ > oo i n | ar g z \ < 
7T, a > 0), 



IF * K a - F' * tfJU < 



1 / a 



3 JV(l-l/r) r ,-JV/r i y-iV(l-l/r)+(l/r-3/2)/2_ 



If < r < 1 and {pa / 'a)" 1 '' / ( 1 ~ r ) = 0(log(l/e)), for 



1\ „ /<7\r/(l-r) 

log- <AT< 
e / Va, 



we have 

11^*^-^*^1100 

< 1^(1^-3/2)72 exp J -iv lQg 



a 



pa J a 
iV"iA-i 



1 1 1 
1 h - log - 



<£ 
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If r = 1 and a/ '(pa) < e -1 , for N = log(l/e), 

\\F * K a - F' * < - (—) 

a \paj 

If r > 1 and a/ (pa) > e _1 , for 



A' 



< 



a 



N < max 



log' 



r/(r-l) 



we have 
IIF*^-^*^! 



^ -exp 

O" 



log 



a/(pa) r 



-(r — 1 — log r) 



<£ 



and the proof is complete. 



□ 



Remark A. 2. Even if stated for a probability measure F supported on 
a symmetric interval [—a, a], Lemma A. 6 holds for every F with supp(F) 
being any compact interval. 

Lemma A. 7. Let f and g be probability densities on R. For every v £ 
(0, 1] such that Jf v dX < oo, we have < 2||/ - <?||^ u / f v dX. 

Proof. Write ||/ - = 2j(f - g)+ dX < 2fmin{/, \\f - g\\oo} dX < 
2 II/ - ffllJo"" / .T dA. The assertion follows. □ 

As noted in Devroye [5], Remark 3, page 2042, if 

(A.13) for some q>0, E f [\X\ q ] < oo, 

then J f v dX < oo whenever v > (1 + ?) ■ For example, condition (A.13) 
is verified for a Student 's-i distribution with v degrees of freedom when 
< q < v. 

The inequality of the following lemma can be proved similarly to the one 
for the Gaussian kernel, see, e.g., the first part of Lemma 1 in Ghosal et 
al. [8], pages 156-157. 

Lemma A. 8. Let K be a probability density on M, bounded and symmet- 
ric around 0. For every a > and every 9j, 6^ £ M., 



\K a (- - 0j) - K a (- - 9^ < 2\\K\ 



9k\ 



a 



< 



0k\ 



a 
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In the next lemma, a sufficient condition is provided for the L -distance 
between kernel mixtures with different variances to be bounded above by 
the distance between the variances. 



Lemma A. 9. Let K be a probability density on M. symmetric around 
and monotone decreasing in \x\. For every probability measure F on 1R and 
every a, a' > 0, we have \\F * K a — F * K a i ||i < \\K a — K a i ||i < 2\a — 
a'\/(aAa'). 



Proof. Note that 

\\F * K a - F * K^W! < [ \\K a (--0)-K a ,(--0)\\ 1 dF(0) = \\K a -K CT/ \\ 1 . 

Jr 

The second inequality in the assertion can be proved as in Norets and Pele- 
nis [26], page 18. □ 



The next lemma provides an upper bound on the Li-metric entropy of 
sets of mixtures with super-smooth kernels. It is based on Lemma A. 6, 
Lemma A. 8 and Lemma A. 9 and can be proved similarly to Lemma 3 of 
Ghosal and van der Vaart [12], pages 705-707, which deals with normal 
mixtures. 



Lemma A. 10. Let K be a probability density on M symmetric around 
and monotone decreasing in \x\, with characteristic function satisfying (2.1) 
for some constants p, r > and < L < oo. Let < e < 1/5. Let < s < 
S < oo and < a < oo be such that, for some v > 0, (a/s) < (log(l/e)) w . 
Define ^ a ,s,s ■= {F * K a : F([-a, a}) = 1, s < a < S}. Then, 



( ^ ) 



where 



N< 



a I 1 



1/r 



a \r/(r-l) ( 1 

i) v log i 



, 2a \ 1 
log — + 1 + log - 

se I e 



if < r < 1, 
if r > 1. 



The following lemma is a slight variant of Lemma 6 in Ghosal and van 
der Vaart [12], page 711. 
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Lemma A. 11. Let K be a probability density on M symmetric around 0. 
Let f be a strictly positive and bounded probability density, non- decreasing 
on (—00, a), non-increasing on (b, 00) and such that f{x) > I > on [a, b). 
For every £ G (0, 1), let > be such that J Q a K T( (x) dx > £. Then, for 
every a < t^, we have f * K a > C^f, with := (C^/||/||oo) £ (0, !)• 
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